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ABSTRACT 

Although the criterion problem has been acknowledged 
as critical in personnel research, few attempts have been made to 
systematically examine the nature and covariates of criterion 
measures of performance. The present research used meta-analytic 
techniques to examine the race effect size for objective measures of 
performance and to compare the relationship between effect sizes for 
objective indices and subjective ratings. Fifty-three studies were 
located that included at least one objective index of actual 
performance, absenteeism or cognitive test performance and one 
subjective measure of performance for the same group of black and 
white employees. The corrected average effect sizes across the 53 
studies were relatively low but quite similar for the objective and 
subjective criteria. Moderating effects for the objective criteria 
were found as race effects were much higher for cognitive than for 
performance criteria. Subjective ratings had a lower effect size than 
objective cognitive test scores but were higher than comparable 
objective performance indices. The implications of the results for 
personnel research practices were discussed and the need for a better 
understanding of the constructs underlying criterion measures were 
emphasized. (Author) 
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Abstract 

Although the criterion problem has been acknowledged as critical 
in personnel research, few attempts have been made to systematically 
examine the nature and covariates of criterion measures of performance. 
The present research used meta-analytic techniques to examine the race 
effect size for objective measures of performance and to compare the 
relationship between effect sizes for objective indices and 
subjective ratings. Fifty-three studies were located that included 
at least one objective index of actual performance, absenteeism or 
cognitive test performance and one subjective measure of performance 
for the same group of black and white employees. The corrected 
average effect sizes across the 53 studies were relatively low but 
quite similar for the objective and subjective criteria. Moderating 
effects for the objective criteria were found as race effects were 
much higher for cognitive than for performance criteria. Subjective 
ratings had a lower effect size than objective cognitive test scores 
but were higher than comparable objective performance indices. The 
implications of the results for personnel research practices were 
discussed and the need for a better understanding of the constructs 
underlying criterion measures was emphasized. 
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The Study of Race Effects in Objective Indices 
and Subjective Evaluations of Performance: 
A Meta-Analysis of Performance Criteria 
A continual issue of concern for organizations is racial 
discrimination in personnel practices. Considerable research has 
revolved around hiring practices, issues of test validity and 
differential prediction for minority and nonminority applicants. 
The research indicates that the average test score tends to be lower 
for minorities but that predictor tests appear equally valid for 
both minority and majority group members (e.g., Bartlett, Bobko, 
Hosier, & Hannan, 1978). These results have led researchers to 
conclude that predictors such as cognitive ability tests are fair 
to minority applicants as they do not systematically underestimate 
the expected job performance of minority groups (Schmidt & Hunter, 
1981). 

The research on personnel testing has mainly focused on 
predictor rather than criterion relate^i issues. An implicit 
assumption underlying this focus is that the criteria employed in 
test validity and fairness studies are job relevant and unbiased. 
This neglect of criterion issues is surprising given the long 
standing concern of personnel and other applied psychologists over 
the ••criterion problem" (Cascio, 1982; Smith, 1976; Wallace, 1965). 
The quality of criterion related validity and test fairness studies 
is heavily dependent upon the appropriateness of the criterion used 
(Burke, 198A; Scott & Hamner, 1975) • Additionally, testing standards 
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require researchers to investigate the relevance of the criterion 
and to study the possibility that irrelevant factors may cause 
criterion bias (American Psychological Association, Division of 
Industrial and Organizational Psychology, 1980). The purpose of 
the present study is to identify the criteria used in testing 
studies and to investigate the extent to which race effects are 
present in various types of performance criteria. 

Performance ratings are the most often used criterion in 
validation studies (Landy 61 Farr, 1980; Schmitt, Gooding, Noe, 6f 
Kirsch, 1984). Despite their widespread use, ratings have been 
criticized as highly vulnerable to rater biases such as halo, 
leniency and stereotyping. This vulnerability to intentional or 
inadvertent racial bias has led researchers to question the 
usefulness of ratings as criteria in test fairness studies. For 
example, in a critique of the Educational Testing Service Project 
on racial bias, (Campbell, Crooks, Mahoney & Rock, 1973;, Anastasi 
(1973) questioned the relevancy of ratings as criterion measures 
for test validation when different ethnic groups were involved. 
In another critique of the project, Wallace (1973) cited the lack 
of relevance (low intercorrelations with work samples), rater bias 
(a rater by ratee interaction) and the spurious nature of rating 
reliability estimates as providing the final ••stake** for the 
internment of supervi&ory ratings as test validation criteria. 

A recent meta-analysis of race effects in ratings by Kraiger 
and Ford (1985) provided a direct examination of the relationship 
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between subjective ratings f*nd race. The study revealed a 
relatively low but consistent ratee race effect size and a rater 
by ratee race Interaction. Whites rated white ratees higher than 
black ratees and black raters assigned higher ratings to blacks 
than to whites. Moderator analyses of the ratings of white raters 
revealed that rating scale format (behavior based/trait), rater 
training (offered/not offered) and rating purpose (administrative/ 
research) had minimal Impacts on the size of the race effects found. 
Race effects, though, were more likely to be found in field than 
laboratory settings and the effect size was higher (favoring white 
ratees) when black ratees constituted a smaller percentage of the 
workforce. 

The Kraiger and Ford (1985) study was limited to the 
examination of race effects in subjective ratings and therefore 
could not directly isolate the relative contributions of ratee 
performance and rater biases to the rating differences found. The 
interaction effect found for race of rater and ratee suggests that 
some degree of bias is present in the ratings as both white and 
black raters evaluated many of the same ratees. Nevertheless, the 
effects found do not preclude the possibility that actual performance 
differences between races exist. Albright (1973), for example, has 
argued that it is premature to dismiss racial differences in ratings 
as due largely to bias without a comparison to more objective 
criterion measures such as turnover and productivity criteria which 
more closely reflect real life decisions and actions in organizations. 
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Although the dlstlnctl'^n between subjective and objective 
criteria is problematic and somvhat arbitrary (Guion, 1981; 
Muckler, 1982), a number of studies have examined the relationship 
of performance ratings and objective measures of performance. 
Laboratory and simulation studies have generally found strong 
relationships between actual performance and performance ratings 
(Bigoness, 1976; Borman, 1978; Schmitt & Lappin, 1980). The results 
from field studies are more conflicting; some studies have found 
low relationships between objective indices and subjective ratings 
of Job performance (Alexander 6f Wilkins, 1982; Hausman & Strupp, 1955; 
Seashore, Indik 6f Georgopolous, 1960) while other studies have found 
more substantial relationships (Bass 6f Turner, 1973; Kirchner, I960). 

Two recent meta-analyses support the findings of low to 
moderate relationships between subjective and objective measures. 
Heneman (1983) examined the relationship of cost or profit related 
criteria and overall effectiveness ratings across fourteen studies 
and found a mean corrected correlation of .28 and a large 95% 
confidence interval which Included zero. Hunter (1983) viewing 
ratings as the dependent variable, attempted to model the 
relationships among ability tests. Job knowledge tests, work 
samples and performance ratings. The resultant multiple 
correlation for the prediction of the supervisory ratings from 
these objective sources of Information was .42. 

While the results of the above correlational studies Indicate 
that ratings and objective measures are related, there remains a 
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large amount of variance unaccounted for In both sets of measures, 
Gulon (1983) has suggested that the Impact of exogenous variables, 
such as ratee and rater characteristics and contextual factors, 
must be Included In any model that attempts to Increase our 
understanding of the relationships among objective and subjective 
criteria, Slmll^^rly, Scott and Hamner (1975) and Mobley (1982) 
have stressed the need for more research on objective and subjective 
measures of performance and their relaltonshlp to possible 
contaminating factors such as race and sex. 

Researchers have continued to call for Increased use of 
objective measures of performance In personnel research, especially 
measures that have utility value to the organization (Tenopyr & 
Oeltjen, 1980; Zedeck & Casclo, 1984). An Implicit and often 
untested assumption underlying the use of objective measures is 
that they are less prone to biases than subjective measures. This 
orientation Is demonstrated In the studies above which examined 
the relevancy of ratings as a function of their relationship to 
objective performance Indices, While less prone to certain biases 
Inherent In more judgmental measures, objective measures are 
contaminated to an unknown degree (Casclo & Valenzl, 1978), Unlike 
the extensive Investigation of biasing Influences In subjective 
ratings, there have been few systematic attempts to Investigate the 
nature and covarlates of objective criterion measures. In 
particular, research Is lacking which examines the relationships 
between employee race and various types of objective criterion measures. 
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The present study builds upon the work of Kraiger and Ford 
(1985) and examines the relationship of race and criterion measures 
through meta-analytic procedures. While the Interpretation of 
racial differences Is problematic without some ultimate measure of 
performance, differences In effect size may be more readily 
Interpretable as bias or relevance with multiple criterion measures. 
Therefore, the two major goals of the study are to: (1) Investigate 
differences In race effect size among different types of objective 
criteria; and (2) directly compare the relationship between effect 
sizes for objective Indices and subjective ratings of performance. 

Method 

An attempt was made to locate, summarize and analyze the 
results of all published studies and a number of unpublished 
studies reporting at least one objective Index and one subjective 
rating of performance for the same sample of black and white 
employees. A majority of the studies were used In a previous 
analysis of race effects In performance ratings (Kraiger & Ford, 
1985). Additional studies were located by systematically reviewing 
the recent literature and by soliciting responses from researchers 
active In test validation and performance assessment. In some 
cases more than one sample possessing the above characteristics 
was described In the same report or article. As a result, a total 
of 53 samples (25 published and 28 unpublished) were located. A 
complete list of studies Is presented In the appendix. 
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Analysis 

The meta-analysis cumulated polnt-biserlal correlations 

between ratee race (arbitrarily coded White » 1, Black - 0) and 

objective Indices of performance and subjective ratings in order 

2 

to compute mean effect sizes (r , ) and variances (a , ) across 

— pb pb 

studies. Polnt-blserlal correlations were typically calculated 
from either a t-test of group differences or reported group means 

and standard deviations. An estimate of variance due to sampling 

2 2 
error (a ^) and the population variance for effect sizes (a ^) were 

computed for both subjective and objective criteria using procedures 

explained by Hunter» Schmidt^ and Jackson (1982). The estimated 

standard error (a ) was used to establish confidence Intervals 

e 

around the appropriate r^^^ to test the hypothesis that r^^^ « 0 In 
the population. 

Since the size of a polnt-blserlal correlation Is affected by 
the relative proportions of the two groups^ effect sizes for 
Individual studies were corrected for differences In subgroup 
sample sizes prior to cumulation. Estimated sampling error was 
then adjusted for this correction. 

Coding of study characteristics and effect size calculations 
for the meta-analysis were completed by two of the authors and 
differences resolved through consensus of the three authors or 
recalculation of the appropriate statistic. 
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Moderator Analyses 

2 

The population variance (a ^) estimates actual study-to-study 
variation In effect nizes with variance due to small samples and 
unequal sample sizes removed. Hunter et al. (1982) stated that 
this corrected variance may be trivial and due to statistical 
artifacts or may be nontrlvlal and suggest possible moderators. 
The two tests of triviality of the corrected variances 
suggested by Hunter et al. (1982) revealed significant chi squares, 
X^(53, N - 10,222) « 434.97, p < .01, for the objective data and 
X^(53, N « 9,443) » 167.22, p < .01, for the subjective data, and 
a small ratio of sampling error to true variance ratio for the 
sample of objective (.13) and subjective (.33) criteria. These 
results suggested that the effects were non-trivial and supported 
the investigation of potential moderators in the data. 

A total of 44 different objective criteria were used across 
the 53 samples. To compare effect sizes across criterion types, 
it was necessary to reduce this set of criteria to a smaller, more 
conceptually meaningful number of categories. Since an adequate 
classification of objective criteria was not found, a categorization 
system was developed.^ For this task, seven advanced graduate 
students free sorted the 44 criteria into categories. Five stable 
categories of criteria emerged from the sorting and were labeled 
as training tests, job knowledge tests, absenteeism and tardiness, 
direct performance (e.g., units produced, shortages) and indirect 
performance (e.g., accidents, customer complaints). Five additional 
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graduate students Independently resorted the 44 criteria Into the 
five derived categories. The high level of agreement (91%) 
resulting from the sorting task demonstrated the reliability of the 
criterion categories. The researchers examined the disagreements 
and came to a consensus as to the appropriate categorization of the 
criteria. 

The five categories were further reduced to three by combining 
the Indirect and direct performance measures Into a performance 
Indices category and by combining the training and Job knowledge 
tests Into a cognitive criteria category. To maintain Independence 
of observation, multiple objective criteria belonging to a particular 
category were combined If they were related to the same subjective 
rating. This combination of criteria reduced the sample size for 
the moderator analysis from 53 to 49. The three categories of 
performance Indicators (N * 20), absenteeism ( N « 13) and cognitive 
criteria (N » 16) provided the small number of categories needed to 
meaningfully examine differences In race effect size by criterion 
category. It should be noted that analyses conducted with the five 
criterion categories yielded similar results as those to be 
presented for the three criterion categories. 

For the subjective evaluations, overall ratings of effectiveness 
were available from each of the 53 samples. Ten samples also 
included a specific rating that marched the type of objective 
criteria gathered In that study (e.g., a rating of job knowledge 
and a test of job knowledge). When available, the specific ratings 
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were used in place of the overall effectiveness ratings in the 

2 

analyses to more closely match the objective criterion. Each 
subjective rat ng was matched by category with the objective 
criterion in the same sample and average effect sizer^ across 
samples were calculated. This allowed for the direct comparison 
of objective and subjective effect sizes. 

It should be noted that nearly all raters were white so that 
effect sizes for ratings by black raters could not be compared to 
objective data. In addition, few studies reported criterion 
reliability data to use in correcting for attenuation. The limited 
literature relevant to criterion reliability indicates that 
performance (except for repetitive jobs; (Rothe, 1947; Rarabo, Chomiak 
& Price, 1983) and absenteeism measures (Muchinsky, 1977) are 
particularly unstable. Based on this literature, the reliabilities 
for objective measures of performance and absenteeism were set at a 
conservative level of .60. The reliability for cognitive tests was 
set at .80 (Hunter & Hunter, 1984) while the reliability for the 
performance ratings was estimated to be .70 (Kraiger & Ford, 1985). 

Finally, moderating effects were shown by classifying the 
studies Into relevant subsamples and recomputing subsample jr^^'s, 
a's and confidence intervals. Differences in subgroup effect sizes 
were tested for significance by a procedure adapted from Rosenthal 
and Rubin (1982) and previously used in the meta-analysis of race 
effects (Krai^^er & Ford, 1985). Rosenthal and Rubin (1982) have 
shown that their derived quotient is distributed as the standard 
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normal deviate, A significant Z indicates that effect sizes 
differ between at least two moderator subgroups* 
Correlation of Eff e ct Sizes 

To further examine the relationship between objective and 
subjective criteria, the effect size for the objective measure and 
the subjective measure for each sample were correlated. A 
correlation of effect sizes was computed for the 53 samples overall 
and within the three criterion categories of performance, absenteeism 
and cognitive criteria. The results provide an indication of the 
covariation of race effect size between subjective and objective 
measures (i.e., the extent to which a sample with a large race 
effect size on a subjective rating tended to have a large race 
effect size on the objective criterion of performance). 

Results 

The results of the meta-analysis are presented in Table 1. 
The table shows sample sizes for whites and blacks, total sample 
sizes, corrected effect sizes, variance estimates, and confidence 
intervals for the 53 studies with an objective and subjective 
measure of perfonoance. Information is also presented for the 
three criterion categories of performance indicators, absenteeism 
and cognitive criteria. 

Insert Table 1 about here 

The best estimate of the population effect size is the mean 
point -biserial correlation corrected for unreliability. For the 
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objective Indices, this estimate was .209 based on a sample of 
10,222 (7,405 whites and 2,817 blacks) employees. For the 
subjective ratings, the mean polnt-blserlal correlation was quite 
similar (.204) and was based on a total sample size of 9,443 (6,791 
whites and 2,652 blacks). The 95% confidence Interval for the 
objecwlve Indices (.06 < p < .36) and the subjective ratings 
(.04 < p < .37) both excluded zero. This finding Indicates that 
whites are rated higher than black ratees and that the level of 
performance for whites Is higher on the objective performance 
Indices. 

Moderator Analyses 

Table 2 presents the results of the moderator analyses which 
compared the race effect size found across the three criterion 
categories. The table shows the corrected mean polnt-blserlal 
correlations for objective Indices and subjective ratings across 
the three criterion categories. This allowed for the testing of 
differences In affect size for the three types of objective 
measures and for the differences between objective and subjective 
measures of performance. The test of differences In effect size 
across the three criterion categories for the objective and 
subjective Indices of performance are provided In the rows of the 
table. The comparisons between objective and subjective effect 
sizes within each criterion category (i«e., performance indicators, 
absenteeism, cognitive criteria) are presented in the three columns 
in Table 2. 
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Insert Table 2 about here 

The results of the Z tests Indicated that there were 
significant differences la effect sizes across the three types of 
objective Indices (z » 5.63; p < .01). Inspection of the three 
average effect sizes revealed that the difference resulted from 
the larger effect size for the cognitive criteria than for either 
the performance or absenteeism criteria. Differences In effect 
sizes for the subjective ratings across the three criterion 
categories were nonsignificant (z^ - 0.45; p > .05). 

Within criterion categories » Z tests were calculated for the 
comparison of objective and subjective criteria. In the performance 
criterion category, the effect size for the objective performance 
indicators was significantly smaller than the comparable subjective 
rating effect size (z - 3.51; p < .01). Conversely, the effect 
size for the objective cognitive criteria was significantly larger 
than the effect size for the subjective ratings (z > 2.13; p < .01). 
There was no significant difference in the objective and subjective 
rating effect sizes for the absenteeism category. 
Correlation of Effect Size 

For the overall analysis of 53 samples, the correlation of 
objective and subjective effect size was .A3 (p < .01). An 
examination of the three criterion categories revealed significant 
correlations between objective and subjective effect size for the 
performance indicators (N - 20, r « .AA; p < .05) ind cognitive 
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criteria (N « 16; r = 55; p < .05) • The correlation of the two 
effect sizes for the absenteeism data was nonsignificant (N » 13; 
r « .17; p > .05). 

Discussion 

One goal of the present meta-analysis was to document the 
extent of racial differences on objective criteria. The overall 
results showed a relatively small but significant race effect size 
for objective criteria. The investigation of differences across 
types of objective criteria found that the average effect size for 
the cognitive (training and job knowledge tests) criteria was larger 
, than the effect size for absenteeism and performance data. The mean 
effect size was only slightly higher for the performance than for 
the absenteeism data. 

The second goal of the study was to compare effect sizes for 
subjective rating criteria and the objective measures. The results 
indicated that across all studies and criterion categories, the 
effect size for subjective and objective criteria were virtually 
Identical. White employees were rated higher and were performing 
at a higher level (as measured by the objective criteria) than 
black employees. Nevertheless, differences in effect size were 
evident between objective and subjective measures within criterion 
category level. First, cognitive tests enlarged the differences 
between races relative to a matched set of subjective ratings. 
Second, actual performance indicators revealed smaller differences 
relative to a matched set of subjective ratings. The magnitude of 



ERIC 



17 



A Study of Race Effects 

17 

effect size was similar for the objective and subjective ratings 
for the absenteeism category. 

Closer Inspection of the results In relation to current 
perspectives In personnel research reveals a number of Interesting 
patterns. First, despite evidence from test fairness studies that 
blacks typically dcore about one standard deviation (Lp\y 
approximately .50) below whites on pre-*hlre aptitude tests (Hunter, 
Schmidt, & Hunter, 1979) and continue to demonstrate lower performance 
on job knowledge examc » .34 In the present study), subgroup 
differences In actual on-the-job performance (£ « .16) do not appear 
as large. This result Implies that both aptitude tests for selection 
and job knowledge tests measure some construct correlated with race 
but somewhat Irrelevant to actual job performance (Wallace, 1973). 

Second, while job knowledge has been found to be strongly 
related to supervisory ratings (Hunter, 1983), race effects for 
ratings are smaller than for cognitive criteria such as job knowledge 
tests. This result Implies that although differences In job 
knowledge may be Incorporated In their ratings, raters must use 
other factors that have the effect of reducing race effects In 
ratings. Since the effect size for ratings Is closer to the effect 
size found for actual job performance. It could be argued that 
performance Information Is another critical factor Incorporated 
Into ratings. Interestingly, Ford, Schechtman and Kralger (1985) 
found that white raters placed more weight on objective job 
performance Indices but a similar weight to job knowledge 
Information when rating blacks than when rating white ratees. 
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Third » race effects tended to covary between objective and 
subjective measures (e.g., studies with large (or small) race 
effect sizes for ratings tended to have a large (small) effect 
size for the same sample of employees on the objective measure. 
These results counter assertions that objective measures may be 
preferable to ratings as they are less prone to race effects 
(Boehm, 1972; Bray & Moses, 1972). On the other hand, the 
relatively high degree of consistency In the effect sizes found 
across multiple criterion measures suggests that the race effects 
found In subjective ratings cannot be solely attributed to rater 
bias. 

The results of the meta-analysis have Implications for test 
validity and fairness research as well as for future res-^.arch on 
criterion measurement. Schmidt and his colleagues Ce«g., Schmidt 
& Hunter, 1981; Schmidt, Hunter, Pearlman, & Shane, 1979; Schmidt, 
Pearlman, & Hunter, 1981) have conducted a number of reviews and 
studies which Indicate that validities are similar for dl'^ferent 
races, predictors are fair to minorities and validities are 
generallzable across situations. In conducting these studies, 
criteria (performance ratings, job knowledge tests, job 
proficiency scores) are argued to be substltutable because the 
Intercor relation of criterion measures approaches unity when 
corrected for measurement reliability (Pearlman, Schmidt, & Hunter, 
1980). 
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The results of the present neta-analysls Indicate that 
criterion measures are not substitutable In terms of expected race 
effect size. For example^ the use of a Job knowledge test as a 
criterion of performance will result In a larger expected race 
effect size than for other possible criterion measures such as 
actual performance Indices or subjective ratings. While ratings 
have been criticized for blas» the use of cognitive criteria may 
also result In what has been labeled apparent but false non- 
discrimination (Casclo, 1982). In this case, the same factors that 
act to depress the performance of a subgroup on predictor aptitude 
tests are likely to be present In the Job knowledge criteria. 
Similarly, Burke (1984) has recently suggested that some component 
of generalized validity reflects a spurious association between 
predictor and non job-related biases on the criterion. 

Therefore, although It Is Important to know that validity 
coefficients are similar between races and that regression equations 
do not underpredlct Job performance, it is also premature to 
consider criteria as substitutable. The present meta-analysis 
showed that different measures of performance have somewhat 
different relationships to the exogenous variable of race. This 
result points to the need for a greater understanding of the factors 
or constructs being measured by criterion measures and their 
relationships to other (predictor) constructs. 

A construct validation approach to criterion measurement 
requires the cumulative understanding of a construct that comes 
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from a sizeable body of empirical data; the kind of data collected 
In research exploring the network of associations and situations 
In which a measure acts (Nunnally^ 1967). The present study has 
provided an association of one demographic variable of race to 
objective and subjective measures of performance. Future research 
Is needed which focuses on the job relevant and Irrelevant factors 
underlying these criterion measures. Without such analyses » It Is 
difficult to Interpret the reasons for racial differences found on 
the criterion measures. 

For example, differences In actual performance or job knowledge 
tebts may simply reflect the selection policies of organizations. 
Because of differing selection ratios, organizations may be able to 
select only high ability applicants from the white applicant pool 
but must select from a wider diversity of ability for the minority 
applicants. These differences, at the time of selection, are then 
reflected on the criterion measures (Kroeck, Barrett, & Alexander, 
1983). Actual performance differences may also reflect to some 
unknown extent organizational practices such as blacks being placed 
on older equipment, given less desirable work territories or sent 
Into high risk situations In which accidents are more likely to 
occur. In this case race acts as an indicator of underlying 
sources of job Irrelevant variance. An Interesting Implication 
from this perspective Is that performance ratings may be highly 
''relevant" in the sense that they reflect existing organizational 
conditions and practices rather than the inherent biases of the 
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rater. Racial performance differences may also reflect Individual 
level factors such as the lack of mentors and self limiting 
behaviors (Ilgen & Youtz, 1984) or simply lower job tenure 
(Bernardln, 1984) for blacks than for whites. 

Unfortunately, data relevant to these Issues were not available 
for systematic analysis In this sample. Interestingly, O'Connor and 
his colleagues (O'Connor, Peters, Pooyan, Weekly, Frank, & Erenkrantz, 
1984; Peters, O'Connor, & Rudolf, 1980) have been conducting a line 
of research regarding the Impact of organizational constraints on job 
performance. Such research, while not directly addressing the Issues 
above, provides a useful model for the type of research needed to 
better understand the constructs we are actually measuring when 
gathering "objective" measures of performance as well as a better 
understanding of the constraints affecting the relationship between 
Individual differences and job performance. 

To Increase our understanding of the constructs underlying 
subjective ratings, Gulon (1983) has suggested the need for 
longitudinal studies in v^hich records of evaluations, ratee 
characteristics, any changes in rater characteristics and changes 
In circumstances over time are examined for cyclical effects either 
in contextual variables or performance. Another research direction 
is the investigation of the processing of job relevant and job 
Irrelevant factors by raters in the evaluation of performance. 
Pettigrew (1979), for example, has suggested the counterintuitive 
premise that a positlvity bias is operating in which ratings 
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reflect inflation of the majority group's ratings rather than 
an assumed deflation of ratings against minorities. Positivity 
bias argues that majority group members receive higher ratings 
because compensatory job irrelevant factors (e.g., familiarity with 
the rater) are considered for majority group members while minorities 
group ratings are more reflective of true performance levels 
(Kraiger, 1981). While these behavior patterns have been recognized, 
research on the weighting of factors has not been systematically 
applied to the performance evaluation domain. Multidimensional 
scaling, policy capturing, and Information processing boards from 
decision making research (e.g.. Billings & Marcus, 1983) provide 
useful methodologies for conducting research on the effects of job 
relevant and irrelevant influences on rater judgments. 

The results of the present meta-analysis, when combined with 
the previous work of Kraiger and Ford (1985), provide a comprehensive 
analysis of race effects in performance criteria. While the data 
upon which this analysis is based should be continually updated, 
research needs to go beyond the simple focus on whether race effects 
occur in criterion measures. Regardless of the specific research 
direction taken relevant to criterion measurement, we conclude by 
paraphrasing Wallace's (1974) advice to observe high validity 
coefficients from test fairness and validity generalization studies 
with more suspicion and less euphoria, to seek instruction from 
them rather than reassurance, to use them to create constructs 
rather than to build empires and to worry about criteria first and 
predictors later. 
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Footnotes 

For example, previous attempts to categorize criteria have 
grouped together error counts, attendance, job samples, tenure 
and job knowledge tests (Hunter, Schmidt, & Hunter, 1979) or 
grouped work samples, training, and rating criteria (Schmidt, 
Pearlman, & Hunter, 1980). 

The average effect size for the specific ratings and the overall 

— 2 

ratings from the same studies was quite similar (x " a » .12; 

— 2 

X " .12, a > .19; respectively) and analyses conducted with only 
overall ratings yielded similar results as those presented for the 
studies with specific ratings. 
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Table 1 

Mean and Variance of Race Effects by Type of Objective and 
Subjective Criterion 



Sample size 



n 

No, Corrected 
studies N3 N^^j.^^ v^^ 



Total Sample 

Objective criteria 53 

Subjective criteria 53 
Criterion Categories 

Performance indicators 

Objective 20 
Subjective 20 

Absenteeism 

Objective 13 
Subjective 13 

Cognitive 

Objective 16 
Subjective 16 



7405 2817 10222 .209 
6791 2652 9443 .204 



3260 1027 4287 .159 

3122 1008 4130 .221 

1529 622 2151 .112 

1563 658 2221 .149 

2371 1018 3389 .336 

1909 873 2782 .226 



(table continued) 
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Table 1 (cont.) 

Mean and Variance of Race Effects by Type of Objective and 
Subjective Criterion 



2 2" 2 
a a a 
rep 



Corrected 
confidence Intervals 



Total sample 
Objective criteria 
Subjective criteria 
Criterion categories 
Performance Indicators 

Ob j ectlve 

Subjective 
Absenteeism 

Objective 

Subjective 
Cognitive 

Objective 

Subjective 



.045 .006 .040 
.018 .007 .011 



.046 .006 .040 

.018 .007 .on 

.017 .008 .009 

.019 .007 .012 

.041 004 .036 

.017 .007 .010 



.058 < P < .360 
.037 < P < .371 



.008 < P < .310 
.058 < P < .384 

-.062 < P < .286 
-.014 < p < .312 

.205 < P < .467 
.065 < P < .387 



Corrected for unequal sample sizes and for attenuation, 
^Corrected for added variance due to correction to polnt-blserlal 



for unequal sample sizes. 
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Table 2 

Moderator Analyses Across Criterion Categories and for the 
Comparison of Objective and Subjective Criteria 

Criterion category 

Performance ALsenteelsro Cognitive 

Criterion type 7 r r Z test^ 

c c c 

Objective Indices .159* .112 .336 5.36* 

Subjective ratings .221 .149 .226 0.45 

Z test 3.51* 0.80 2.13* 

Entries are mean polnt-blserlal correlations as corrected for 
unequal sample sizes and attenuation for unreliability. 
b„ 

Z tests are based on Rosenthal and Rubin (1982). 
*£ < .01 
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